WordNet 2 - A Morphologically And Semantically Enhanced Resource
نویسندگان
چکیده
Th~s paper presents an on-going project mtended to enhance WordNet molpholog~cally and semanttcally The mottvatmn for th~s work steams from the current hm~tat~ons of WordNet when used as a hngmst~c knowledge base We enwmon a software tool that automatically parses the conceptual defining glosses, attributing part-ofspeech tags and phrasal brackets The nouns, verbs, adjectives and adverbs from every defimtmn are then d~samb~guated and hnked to the corresponding synsets Th~s increases the connectlv~ty between synsets allowing the ~etneval of topically ~elated concepts Furthermore, the tool t~ansforms the glosses, first into logical forms and then into semantm fo~ms Usmg der~vatmnal morphology new hnks are added between the synsets 1 M o t i v a t i o n WordNet has already been ~ecogmzed as a valuable ~esource m the human language technolog> and know, ledge processing commumtms Its apphcabfl~ty has been c~ted m mo~e than 200 papers and s~stems have been m~plemented usmg WordNet A Wo~dNet bkbhog~aph~ ~s mamtamed at the Umve~mt) of Penns:~l~ama ( http //www c~s upenn edu/~oseph~ /wnbtblw html) In Europe, WordNet ~s being u~ed to develop a multflmgual database w~th basic semantic relatmns between words for several European languages (the EuroWordNet project) C a p a b i h t i e s WordNet was conceived as a machine-readable dmtlonary, followmg psychohngmstm principles Unhke standard alphabetmal dmt~onaHes ~hmh o~gamze vocabula~ms using mo~phologmal mmllm ltms, WordNet structures lex~cal reformation m terms of word meanings WordNet maps word forms m ~ord senses usmg the s)ntact~c category as a parametel Although it covers onl~ fouI paits of speech nouns verbs, adjectives and adverbs, it encompasses a large majont) of Enghsh words ( http / /www cogscz pmnceton edu/~..wn) Wolds of the same syntactm catego~) that can be used to express the same meamng are grouped into a smgle synonym set, called synset Words wlth multiple meanings (polysemous) belong to multiple synsets An ~mportant part of the 99 643 synsets encoded m WordNet 1 6 contain word collocatmns, thus representing complex nominals (e g the synset {manufacturer , maker, manufac tur ing business} , (omplex velbals (eg the synset {leave o f f i c e , qu i t , s tep down}, complex adjectlvals (e g the ~ynset { t rue , dead on t a r g e t } or complex adverbmls (e g the synset {out of hand, beyond cont ro l} The iep~esentatmn of collocatmns as synset entries p~ov~des for their semantm mterp~etatmn Wolds and concepts are furthei connected through a small set of lexmo-semantm relatmns The dominant semantm relatmn is the hypernymy, xvh~ch structures the noun concepts m 11 hmraichms and the verb concepts into 512 )he, atchins Thlee melonym Ielatlons are encoded between noun concepts the ha~_member, the ha~_~tu]f and the has_part ~elatlons Loglcal opelatlon~ betx~een events or entltms ale modeled through entazlment and cause_to ~elatmns between verb concepts or antonymy relatmns among nouns, veibs ad)ect~ves or adverb words The~e are only a few mo~phologmally motivated connectmns between x~ords known as perta~mym relatmns L l m l t a t m n s The mare ~eaknesses of \Vo~dNet c~ted m the hte~ature ale 1 The lack oi connections between noun and verb hmrarctnes 2 Limited number of connectmns between topically related words 3 The lack of morpholog!cat relations 4 The absence of thema61c relatmns/ selectmnal restnctmns 5 Some concepts (word senses) and relatmns are mmsmg " 6 Since glosses were written manually, sometimes the2e m a lack of umform~ty and consmtency 2n the defimtmns The key idea m our project is to put to wo, k the rich sourse of mformauon contained m glosses that now can be used only by humans to Iead the deflmtmn of synsets For example, Wo, dNet 1 6 hsts the concept {cat , t rue ca t ) with the gloss (fellne mammal usually havlng thlck soft fur and belng unable to roar, domestxc cats, wxldcats) Currently, from a concept like thin, only a few other concepts could be reached In Extended Wo2dNet, the concept {cat, true cat} will be ,elated to 215othel concepts (I0 from its own gloss, 38 flom the glosses of its hypern>ms, 25 concepts that use ~t m the*r glosses as a defining concept plus other 142 concepts with which the concept mteracts in these 25 glosses) Thin level of mformatmn ,s rich enough to presume that the Extended WordNet will work well as a knowledge base for common-sense reasoning 2 R e l a t e d w o r k Machine Readable DmUonanes (MRDs) have long been ,ecogmzed as ~aluable resources m computauonal hngmstlcs In their paper, Ide and Vetoms (Ide and Veloms, 1993) plojected a rather pes~lmmuc outlook for the uuhty of MRDs as knowledge sources, a view that has impeded the enthus2asm of some researchers (Wfiks et al 1996) make a strong argument m favor of using MRDs and shine thel, posluve experience w~th using some dlcuonarms The MmdNet project at Mmrosoft alms at fully automatmg the development of a vel} large lexlcal knowledge base using t~o MRDs the Longman DicUonary of ContemporaD Enghsh (LDOCE) and the American Heritage Third EdlUon (AHD3) Man:y techmcal aspects of thin project are rooted m the works of Vanderwende (Vanderwende 1996) and Richardson (R2chardson 1997) 3 W o r d s e n s e d i s a m b i g u a t i o n o f g l o s s c o n c e p t s There are se~e, al dlffe~ences bet~een gloss dlsamblguauon and text dlsamb~guatmn A n-la]oi difference is that m our project we know the meaning of each gloss, namely the synset to whmh a gloss apphes Second, the glosses contain a defimUon, comments, and one or more examples We address the word sense dmamblguaUon problem by using three complementary methods (a) heunstms, (b) conceptual dens,ty, and (c) staustins on large corpora The first two methods rely enurely on the mfolmaUon contained m WordNet, while the th,rd one uses other corpora Specffically, the sources of knowledge available to us me (1) lexlcal mformauon that includes part of speech, posluon of ~ords (1 e head word), and lexmal lelauons (2) collocauons and s)ntacuc patterns, (3) s}nset to which a gloss belongs, (4) hypernyms of s)nset and their glosses (5) synsets of pobsemouns x~o2ds and their glosses, (6) hypernyms of synsets of polysemous words, and their glosses, and so on M e t h o d 1 Classes of heur ,s t ,cs for word sense d m a r n b l g u a t m n A statable techmque for dmamblguatmg dmuonarms is to rely on heu!mucs able to cope with d~ffe2ent sources of mformauon Work m tins alea w~ doue by Ravin (Rax m 1990) in a similar project at IBM, (Klavans et al 1990), and others We present no~ some of the heunsUcs used by us 1. ClassH y p e r n y m s A way of explaining a concept m to speclahze a more general concept (, e a hypernym) It m hkely that an explanatmn begins with a phrase whose head is one of ~ts hypernyms, and the features are expressed either as attributes m the same phrase o2 as phrases attached to the first phrase Example The gloss of synset {xntrusxon} is (ent rance by f o r c e or without permxsslon or welcome) • e n t r a n c e # 3 , the head of the fiist phrase, is a hype~n)m of znt ruszon, thus we pick sense 3 of noun en t rance (The senses in Wo2dNet a2e ,anked acco, dmg to then frequency ot occmrence m the Brown corpus, e n t r a n c e # 3 means sense 3 of wo, d en t rance ) 2 Class L m g m s t l c Pa ra l l ehsm It 2s hkel? that the s} ntacuc pal allehsm of t~ o xx ord~ uanslates into semantic parallelmm and the ~xo2ds may have a common hypernym, or one m a hypernym of the other Fo~ adjectives, the hypein) m) is replaced by the similarity relation Other heuristics in this class check ~hether or not two pol)semous words belong to the same synset, or one is a hypern} m of the other, or if they belong to the same the2 arch:y Example The gloss of { i n t e r a c t i o n } is (a mutual or. reczprocal actlon) • Adject ive r e c i p r o c a l has only one sense ,n WordNet 1 6, whereas m u t u a l has two senses But we find tha t between sense 2 of m u t u a l and r e c i p r o c a l there is a szmdar l ink m WordNe t 1 6, thus pick
منابع مشابه
Putting Semantics into WordNet's "Morphosemantic" Links
To add to WordNet's contents, and specifically to aid automatic reasoning with WordNet, we classify and label the current relations among derivationally and semantically related noun-verb pairs. Manual inspection of thousands of pairs shows that there is no one-to-one mapping of form and meaning for derivational affixes, which exhibit far less regularity than expected. We determine a set of sem...
متن کاملSerelex: Search and Visualization of Semantically Related Words
We present a system which provides, given a query, a list of semantically related terms. The terms are ranked accordingly to an original semantic similarity measure learned from a huge corpus. The system performs comparably to dictionary-based baselines with no need of any semantic resource such as WordNet. The further study shows that users are completely satisfied with 70% of query results. D...
متن کاملLinguistic Linked Data for Sentiment Analysis
In this paper we describe the specification of a model for the semantically interoperable representation of language resources for sentiment analysis. The model integrates ‘lemon’, an RDF-based model for the specification of ontology-lexica (Buitelaar et al. 2009), which is used increasingly for the representation of language resources as Linked Data, with 'Marl', an RDF-based model for the rep...
متن کاملNameNet: a Self-Improving Resource for Name Classification
This paper presents a semantically structured resource of more than 1,600 Name Classes. This structure is based on the noun hypernymy hierarchies in WordNet, expanded and validated by corpus evidence collected from the World Wide Web. The set of seed examples provided by WordNet is boostrapped and the used to automatically construct an annotated training corpus for each Name Class. The resultin...
متن کاملThe Making of Ancient Greek WordNet
This paper describes the process of creation and review of a new lexico-semantic resource for the classical studies: AncientGreekWordNet. The candidate sets of synonyms (synsets) are extracted from Greek-English dictionaries, on the assumption that Greek words translated by the same English word or phrase have a high probability of being synonyms or at least semantically closely related. The pr...
متن کاملOn WordNet Semantic Classes and Dependency Parsing
This paper presents experiments with WordNet semantic classes to improve dependency parsing. We study the effect of semantic classes in three dependency parsers, using two types of constituencyto-dependency conversions of the English Penn Treebank. Overall, we can say that the improvements are small and not significant using automatic POS tags, contrary to previously published results using gol...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999